home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Power Programmierung
/
Power-Programmierung CD 2 (Tewi)(1994).iso
/
doc
/
mir
/
12source
< prev
next >
Wrap
Text File
|
1992-06-29
|
19KB
|
392 lines
══════════════════════════════════════
2. SOURCE CODE GUIDELINES
══════════════════════════════════════
The objective of this unit is to set a framework in
which you are invited to use, respond to, and improve upon, the
techniques and software in the MIR series.
═════════════════════════════════════════════
2.1 Needs of the information searcher
═════════════════════════════════════════════
Good software is user- or market-driven. (We've said
that before!) At this point let's take a closer look at the needs
of a potential user, putting ourselves in the place of a person who
wants to retrieve information quickly and easily from large masses
of data. What are the things that matter? The following items are
mentioned again and again.
The value of time: The time factor shows up in two
ways... learning and system response.
If it takes a course or five days of seminars to learn
how to use a program, most potential users are frightened away.
Buyer resistance sets in if the salesperson doesn't invite you to
take over early in a demonstration. People don't even want to take
the time to read a manual; the majority rarely do!
System response has to do with how much time elapses
until the program reacts to a new instruction from the user. If a
computer program replaces a manual method that took far longer,
people tolerate delays... at first. But over a few months of use,
they grow impatient with slow response. For long term comfort and
acceptance, three quarters of a second seems to be the threshold of
tolerance. You might think in terms of a "three-quarters of a
second, 95% of the time" rule for computer programming. In other
words, make sure that the program, at least 19 times out of 20,
shows some response to new instructions from the user within three-
quarters of a second. That standard is not as difficult to achieve
as it might appear.
Simplicity, simplicity, simplicity: How do we feel
about a computer program that requires combinations of two or three
keys, such as ALT-Shift-H to get help? Or a program where we
search in vain for any logic or consistency in the instruction set?
So long as we have "user friendly" software like that, we need no
enemies.
Simplicity reduces learning time. Simplicity rescues
the user from becoming hostage to the manual or to instruction
summary cards (especially after not having used the program for a
few weeks). Uniform simplicity across many programs is even
better, because the user becomes free to move easily from one
program to another. To the computer firms that promote intuitive,
uniform simplicity, a vote of thanks!
Achieving simplicity for the user places extra demands
on the designer and programmer. But the trade-off between those
demands and the costs of turning hundreds or thousands of potential
users into reluctant technicians is no contest at all. Programmers
and designers, go the second mile and make it easy for the user.
The marketplace in the long run will reward you for it.
The argument against simplicity is that programs will
lack power that more sophisticated users demand. The answer to
that is to bury the extra power in segments of the program or
behind optional instructions or menus so that they do not intrude
on the person who does not want or is not ready for the extra
features. And when the extra features are turned on, still keep it
simple!
Control: Who is in control... the program or the
person running it? The program is in control:
» if it is possible to get locked into a situation from
which there is no escape except through reading large
sections of the manual (or rebooting);
» if the person has to tab through several fields to get
to the next place where data is to be changed or added;
» if the user is forced to traverse a maze of menus
long after becoming familiar with the program;
» if it takes more than two or three keystrokes or
mouse-clicks to quit the program from anywhere
whatsoever within the program;.
These are examples only. People like the feeling that
they themselves are running things, not some faceless programmer or
designer.
Freedom from a ticking clock: One feature of
centralized processing is calculated to devastate the human
psyche... the message when logging out that one has used several
hundred dollars worth of computer time. It rarely hits the
worker's wallet directly, but the message is still received...
guilt, guilt, guilt. It takes the fun out of computer use.
This need argues for more efficient programs. In the
case of centralized processing, the task is finished faster and the
computer resources and any communication costs are lower. Truly
efficient programs open the way to distributed processing, where
the user can leave a personal computer running for hours at a
marginal cost of only a few cents for electricity. Communication
costs are nil. Guilt over the ticking clock is gone.
Freedom from obscure error messages: Capital
punishment is outside the scope of tutorials on indexing and
retrieval. But isn't there a feeling of moral satisfaction in
considering the ultimate deterrent for programmers who wish upon us
in mid-program gems like: "CANNOT ALLOC MORE MEMORY"? Somewhat
worse are the error messages that seem to be in plain language, but
you find out that the content has nothing to do with the actual
problem.
Freedom from the curse of codes: What's wrong with
plain language?
Language of choice: There are two ways for programmers
to accommodate users' preference for programs operating in their
native language. For programs that are not interactive, it is
sufficient if source code is available so that error messages and
the program description message can be translated and the program
recompiled. A well written interactive program has all its screen
text, prompts, instructions and help messages in a separate file.
This text file may be translated into other languages. Careful
attention must be paid to spacing so that messages fit on the
screen in the space assigned by the program. These files are then
named according to a convention that is easily recognized by the
program. Example: ENGLISH.LNG, ESPANOLA.LNG, FRANCAIS.LNG, etc.
If only one file is present, the program automatically appears in
that language only. If multiple such files are in the same
computer directory, the user is offered a choice of languages
immediately when the program starts up.
Context-sensitive help: This feature has become quite
common. The program user may touch a single key (often the F1
function key on personal computers) to get helpful instructions.
These help messages are context-sensitive if their content depends
on what options are open at that point in the program.
More bang per computer dollar: Personal computers have
revolutionized the economics of information. Where access is
needed to up-to-the-minute data (such as seats available on an
airline flight), processing has to be centralized on large
machines. That's expensive... the central mainframe computers, the
communications costs, and the ability to transact data changes that
become accessible to other people immediately. Personal computers
come into their own where data can be downloaded or distributed,
and used at the individual's pleasure. You don't need a million
dollar computer to search using an index; a personal computer
costing less than a thousand dollars is capable of nearly instant
response. Even the complex task of creating an index for a large
quantity of data requires only a moderate amount of computer
"horsepower." Personal computers can carry out quite sophisticated
chores. For example, statistical analysis of survey questionnaires
is in some cases a simple extension of the use of high speed
indexes.
═════════════════════════════
2.2 Design background
═════════════════════════════
Every computer program represents a series of design
decisions. In order to understand more readily MIR technology and
software, you may find some background helpful.
Squeezing each bit... the conservationist start: I
first programmed in 1964 on an IBM 1440 which had 4,096 bytes of
available RAM. That's for the program and the data. One quickly
develops a mindset under these conditions... make every bit within
every byte count for as much as possible. During the 1970s the
prevailing attitude was to pour hardware resources lavishly on any
computer problem. I didn't get on the bandwagon. Effectiveness
and efficiency still mattered. For example, when developing an
early fourth generation language, I took a kind of perverse pride
in squeezing data types into minimum byte counts... dates in 16
bits, postal and zip codes coexisting in 27 bits, area codes and
telephone numbers in 31 bits (achieved by swapping the first two
digits in the area code; it works because the second digit of an
area code is zero or one).
The gigabyte years: Since early in the 1980s the
majority of my computer work has been in connection with databases
in the sixty million to 3 billion byte range. The CD-ROM world
seemed an invitation to be extravagant; space is plentiful. But
efficiency has an interesting payoff. It affects retrieval
timings. Why go out to the disc multiple times or why take the
time to fill huge buffers on each access? Working with compressed
indexes reduces mechanical head movement, by far the greatest time
factor. And compressed indexes can be used for remarkably faster
Boolean operations. More on that later.
UNIX influence: I created the database system that
later became known as FindIT under the Primos operating system,
then shifted in 1985 to UNIX. Predecessor versions of many MIR
software routines were written in the UNIX environment. One moves
in a world of byte streams, pipes, and bit manipulation. DOS by
contrast is oriented toward printable data; witness the fact that
binary data requires special declaration in DOS and cannot be fed
through pipes. (Stdin and stdout in DOS insert a carriage return
in the data whenever DOS encounters a binary byte which happens to
be a linefeed; behavior when a binary CTL-Z is found is even more
unpleasant.) The MIR programs are DOS versions, but the UNIX-style
thinking will show through.
C with a FORtran accent: It's common to learn a
variety of languages when involved with computer programming over
an extended period. My early set included Autocoder, Assembler,
Basic, Cobol, and various forms of machine language. FORtran was
my language of choice from 1968 to 1985. That long in one language
gives one an accent when moving on into another language. C
language was the logical choice for portability and for efficient
control over byte streams. FORtran doesn't promote modularity as
much as C; it's more linear in its thought forms. A C purist might
be shocked at inelegancies and "FORtranisms" in my C code. Fair
enough, but remember that it works! (As Billy Sunday is reputed to
have answered a critic: "I like the way I'm doing it better than
the way you are not doing it.")
≡≡≡≡->> QUESTION:
If you are expert in C++ or in any approach to object
oriented software, you are strongly encouraged to
provide alternative coding to any programs offered in
MIR. Object oriented software is gaining ground
rapidly. C++ language is nearing critical mass in
terms of acceptance and numbers of people competent in
its use.
<<-≡≡≡≡
════════════════════════════
2.3 Design decisions
════════════════════════════
Here then are design decisions that are built into MIR
software. You are not bound by them. But they affect you to the
extent you may use this material as a starting point.
Language: C language is currently the language of
choice for widest portability, at least in North America, and
probably the world. It offers somewhat less power than Assembler
and less clarity than Basic. C doesn't get the preferential
treatment given to Pascal in the Macintosh environment.
Nonetheless C is likely to serve the needs of the widest spectrum
of potential users of computerized indexing and retrieval.
Hardware: The executable versions are compiled for use
on IBM-compatible personal computers. This starting point provides
the widest access, since there are more PCs around than all others
put together.
Operating system and compiler: Again, access by the
widest number of potential users favors DOS. I am using
Microsoft's Version 5.0 in combination with the Microsoft C 6.0
Programmer's Workbench compiler. Switches are set for ANSI C (to
eliminate code unique to Microsoft C) and 8086 runtime operability.
Avoiding code that blows up: Some practices, while
common in C language, lead to messy situations when porting to
other environments. For example, the "varargs.h" files in Sun and
DOS differ; the use of "varargs.h" leads to inconsistencies when
moving from one computer to the other. Therefore variable argument
routines are not used. (One price... warning and error routines
are less elegant.) Dynamic memory allocation also has been
dropped. (Sorry about that.) And as discussed earlier, the
vagaries of DOS reduce dramatically the ability to pipe binary byte
streams. Stdin and stdout have been used only when there is
reasonable certainty that printable files only are involved. To
anyone working in UNIX, feel free to change back; then you can use
series of pipes and avoid the successive creation and deletion of
work files.
═══════════════════════
2.4 Conventions
═══════════════════════
Humans use programs: Any MIR program responds with an
explanation, up to one screen in length, detailing what the program
is intended to do and what arguments it expects. If you wish to
see this quick overview for any program, input the program name
followed by a space and either /U or a question mark. Example:
A_BYTES ?
Normal courtesy to the end user of a program suggests
that we avoid inhuman messages, either obscure or misleading. By
the same token, no program is to have an inescapable situation,
that is, one that gives no direction to the user how to undo a path
or leave a program. Depending on handlers installed, CTL-C may or
may not act as an escape. The most reliable approach is to write
code so that the program always responds to the escape key (ESC).
Humans read programs: Brian Kernighan and Dennis
Ritchie are undoubtedly very fine fellows. Whatever induced them
to inflict on humanity their weird convention in placing brace
brackets? Programs should be for people in every way... including
readability for programmers. In MIR C code, matching brace
brackets are always vertically aligned and their content indented
so that one can see at a glance their range.
Comments, reasonable variable names, descriptive
declarations, full descriptions at the top of source code as to
function, input and output all add to the usefulness of programs.
══════════════════════════════
2.5 Use it, improve it
══════════════════════════════
Let's turn to what you might do with this material.
It's here for you to use. No royalties need be paid to
anyone. Identify your needs in the area of indexing and retrieval,
then select from what is offered in successive MIR tutorial
releases to match your needs.
As a first time user, you have something of value... a
fresh perspective. Are there parts of the tutorials that you find
confusing? Which programs warrant more explanation? May parts be
safely omitted? Please share your thoughts! An ASCII text file
"RESPONSE" is included on the diskettes with this tutorial. Make
a copy of "RESPONSE", edit in your comments, and send it by FAX,
electronic mail, or regular mail to an address listed at the bottom
of the RESPONSE template. (Addresses also appear near the top of
each source code listing.)
You may have noticed on the copyright page and response
template that Marpex has not invited telephone calls. This is not
a discourtesy, but a protection. FAX and electronic mail permit
time shifting; the voice phone does not. I recommend to you a
little book entitled "Peopleware: Productive Projects and Teams"
(Tom DeMarco and Timothy Lister... New York: Dorset Publishing
House, 1987). Look especially at Chapter 8: "You Never Get
Anything Done Around Here Between 9 and 5", and Chapter 11: "The
Telephone".
Note that the tutorials and the source code are made
available to you under two different sets of rules. The tutorials
are shareware; they may be freely copied, but not changed. Your
suggestions concerning the text should be sent to the author.
Under "copyleft" rules, the source code may be changed and
redistributed widely. As a courtesy, please share your source code
changes and new programs with us. We will add all the material
that seems relevant and helpful to later releases. All software
that you provide must, of course, come under the copyleft rules in
order for us to distribute it. (And please, please, please, send
us only source code for which you have rights to share.)
If you work on different equipment and/or under a
different operating system, you are invited to port the code to
that environment. In the final CD-ROM version we will set up
subdirectories as needed for alternate operating systems.
Incidentally, the port to Sun Microsystems UNIX is very simple:
take the binary flags out of file opening sequences, and increase
the #define statements if you want larger buffers.
Please send source code in machine readable form -
either by electronic mail or on a floppy diskette by regular mail.
To get maximum attention for your efforts, please observe the
conventions for software outlined above.
≡≡≡≡->> QUESTION:
Are there ways we could improve this process, and still
stick to the schedule of releases and the copyleft
ground rules? Your thoughts, please.
<<-≡≡≡≡
This is the shape of cooperative development. Join in,
share, have fun.